{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Training YOLOv6 on VOC dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 1: Prepare VOC dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "|  dataset | url | size  | images  |\n",
    "|  :----:  |  :----:  |:----:  | :----:  |\n",
    "| VOC2007 trainval  | [download zip](http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar) | 446MB | 5012  \n",
    "| VOC2007 test  | [download zip](http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar) | 438MB | 4953\n",
    "| VOC2012 trainval  | [download zip](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar) | 1.95GB | 17126"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Download VOC dataset and unzip them, the directory shows like:\n",
    "```\n",
    "VOCdevkit\n",
    "├── VOC2007\n",
    "│   ├── Annotations\n",
    "│   ├── ImageSets\n",
    "│   ├── JPEGImages\n",
    "│   ├── SegmentationClass\n",
    "│   └── SegmentationObject\n",
    "└── VOC2012\n",
    "    ├── Annotations\n",
    "    ├── ImageSets\n",
    "    ├── JPEGImages\n",
    "    ├── SegmentationClass\n",
    "    └── SegmentationObject\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 2: Convert VOC dataset to YOLO-format."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The VOC dataset use xml format annotations as below. (refer to [VOC2007 guidelines](http://host.robots.ox.ac.uk/pascal/VOC/voc2007/guidelines.html))\n",
    "```\n",
    "<annotation>\n",
    "\t<folder>VOC2007</folder>\n",
    "\t<filename>000007.jpg</filename>\n",
    "\t<source>\n",
    "\t\t<database>The VOC2007 Database</database>\n",
    "\t\t<annotation>PASCAL VOC2007</annotation>\n",
    "\t\t<image>flickr</image>\n",
    "\t\t<flickrid>194179466</flickrid>\n",
    "\t</source>\n",
    "\t<owner>\n",
    "\t\t<flickrid>monsieurrompu</flickrid>\n",
    "\t\t<name>Thom Zemanek</name>\n",
    "\t</owner>\n",
    "\t<size>\n",
    "\t\t<width>500</width>\n",
    "\t\t<height>333</height>\n",
    "\t\t<depth>3</depth>\n",
    "\t</size>\n",
    "\t<segmented>0</segmented>\n",
    "\t<object>\n",
    "\t\t<name>car</name>\n",
    "\t\t<pose>Unspecified</pose>\n",
    "\t\t<truncated>1</truncated>\n",
    "\t\t<difficult>0</difficult>\n",
    "\t\t<bndbox>\n",
    "\t\t\t<xmin>141</xmin>\n",
    "\t\t\t<ymin>50</ymin>\n",
    "\t\t\t<xmax>500</xmax>\n",
    "\t\t\t<ymax>330</ymax>\n",
    "\t\t</bndbox>\n",
    "\t</object>\n",
    "</annotation>\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Run the following command to convert voc dataset to yolo format:\n",
    "\n",
    "&ensp;&ensp;`python yolov6/data/voc2yolo.py --voc_path your_path/to/VOCdevkit`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We follow the `07+12` training setting, which means using VOC2007 and VOC2012's train+val(16551) as training set, VOC2007's test(4952) as validation set and testing set.\n",
    "\n",
    "Finally, the directory looks like:\n",
    "```\n",
    "VOCdevkit\n",
    "├── images\n",
    "├── labels\n",
    "├── voc_07_12\n",
    "│   ├── images\n",
    "│   │   ├── train\n",
    "│   │   └── val\n",
    "│   └── labels\n",
    "│       ├── train\n",
    "│       └── val\n",
    "├── VOC2007\n",
    "└── VOC2012\n",
    "```\n",
    "Where `voc_07_12` is the converted yolo-format dataset."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Visualize yolo format dataset (Optional)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To check if your dataset is correct, run the following command:\n",
    "\n",
    "&ensp;&ensp;`python yolov6/data/vis_dataset.py --img_dir your_path/to/VOCdevkit/images/train --label_dir your_path/to/VOCdevkit/labels/train`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 3: Create dataset config file."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Create `data/voc.yaml` like:\n",
    "\n",
    "```\n",
    "# Please insure that your custom_dataset are put in same parent dir with YOLOv6_DIR\n",
    "train: your_path/to/VOCdevkit/voc_07_12/images/train # train images\n",
    "val: your_path/to/VOCdevkit/voc_07_12/images/val # val images\n",
    "test: your_path/to/VOCdevkit/voc_07_12/images/val # test images (optional)\n",
    "\n",
    "# whether it is coco dataset, only coco dataset should be set to True.\n",
    "is_coco: False\n",
    "# Classes\n",
    "nc: 20  # number of classes\n",
    "names: ['aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat', 'chair', 'cow', 'diningtable', 'dog',\n",
    "        'horse', 'motorbike', 'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor']  # class names\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 4: Training.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use the following command to start training:\n",
    "- Multi GPUs (DDP mode recommended)\n",
    "\n",
    "&ensp;&ensp;`python -m torch.distributed.launch --nproc_per_node 4 --master_port=23456 tools/train.py --batch 256 --conf configs/yolov6n_finetune.py --data data/voc.yaml --device 0,1,2,3`\n",
    "\n",
    "- Single GPU\n",
    "\n",
    "&ensp;&ensp;`python tools/train.py --batch 256 --conf configs/yolov6_finetune.py --data data/data.yaml --device 0`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Tensorboard\n",
    "We can use tensorboard to visualize the train_batch/validation predictions and loss/mAP curve, run:\n",
    "\n",
    "&ensp;&ensp;`tensorboard --logdir=your_path/to/log`\n",
    "\n",
    "![Train batch](../assets/train_batch.jpg 'Train batch')\n",
    "\n",
    "![Traing loss/mAP curve](../assets/voc_loss_curve.jpg 'Traing loss/mAP curve')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Evaluation\n",
    "When training finished, it automatically do evaulation on the testset, the output metrics are:\n",
    "```\n",
    "DONE (t=4.21s).\n",
    " Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.632\n",
    " Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.854\n",
    " Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.702\n",
    " Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.272\n",
    " Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.473\n",
    " Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.689\n",
    " Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.518\n",
    " Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.737\n",
    " Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.751\n",
    " Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.554\n",
    " Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.656\n",
    " Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.791\n",
    "Epoch: 399 | mAP@0.5: 0.8542516455615079 | mAP@0.50:0.95: 0.6315693468708705\n",
    "\n",
    "Training completed in 9.206 hours.\n",
    "```\n",
    "Or you can manually evaulation model on your dataset by:\n",
    "\n",
    "&ensp;&ensp;`python tools/eval.py --data data/voc.yaml  --weights your_path/to/weights/best_ckpt.pt --device 0`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 5.Inference\n",
    "\n",
    "&ensp;&ensp;`python tools/infer.py --weights your_path/to/weights/best_ckpt.pt --yaml data/voc.yaml --source data/images/image3.jpg --device 0`\n",
    "\n",
    "![image3.jpg](../assets/image3.jpg)\n",
    "### 6. Deployment\n",
    "\n",
    "&ensp;&ensp;`python deploy/ONNX/export_onnx.py --weights your_path/to/weights/best_ckpt.pt --device 0`"
   ]
  }
 ],
 "metadata": {
  "interpreter": {
   "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6"
  },
  "kernelspec": {
   "display_name": "Python 3.8.2 64-bit",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.10"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
